NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Deep Learning Methods for Sign Language Translation

https://doi.org/10.1145/3477498

Ananthanarayana, Tejaswini; Srivastava, Priyanshu; Chintha, Akash; Santha, Akhil; Landy, Brian; Panaro, Joseph; Webster, Andre; Kotecha, Nikunj; Sah, Shagan; Sarchet, Thomastine; et al (December 2021, ACM Transactions on Accessible Computing)

Many sign languages are bona fide natural languages with grammatical rules and lexicons hence can benefit from machine translation methods. Similarly, since sign language is a visual-spatial language, it can also benefit from computer vision methods for encoding it. With the advent of deep learning methods in recent years, significant advances have been made in natural language processing (specifically neural machine translation) and in computer vision methods (specifically image and video captioning). Researchers have therefore begun expanding these learning methods to sign language understanding. Sign language interpretation is especially challenging, because it involves a continuous visual-spatial modality where meaning is often derived based on context. The focus of this article, therefore, is to examine various deep learning–based methods for encoding sign language as inputs, and to analyze the efficacy of several machine translation methods, over three different sign language datasets. The goal is to determine which combinations are sufficiently robust for sign language translation without any gloss-based information. To understand the role of the different input features, we perform ablation studies over the model architectures (input features + neural translation models) for improved continuous sign language translation. These input features include body and finger joints, facial points, as well as vector representations/embeddings from convolutional neural networks. The machine translation models explored include several baseline sequence-to-sequence approaches, more complex and challenging networks using attention, reinforcement learning, and the transformer model. We implement the translation methods over multiple sign languages—German (GSL), American (ASL), and Chinese sign languages (CSL). From our analysis, the transformer model combined with input embeddings from ResNet50 or pose-based landmark features outperformed all the other sequence-to-sequence models by achieving higher BLEU2-BLEU4 scores when applied to the controlled and constrained GSL benchmark dataset. These combinations also showed significant promise on the other less controlled ASL and CSL datasets.
more » « less
Full Text Available
Fully Convolutional ASR for Less-Resourced Endangered Languages

Thai, Bao; Jimerson, Robert; Ptucha, Raymond; Prud’hommeaux, Emily (May 2020, Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL))

The application of deep learning to automatic speech recognition (ASR) has yielded dramatic accuracy increases for languages with abundant training data, but languages with limited training resources have yet to see accuracy improvements on this scale. In this paper, we compare a fully convolutional approach for acoustic modelling in ASR with a variety of established acoustic modeling approaches. We evaluate our method on Seneca, a low-resource endangered language spoken in North America. Our method yields word error rates up to 40% lower than those reported using both standard GMM-HMM approaches and established deep neural methods, with a substantial reduction in training time. These results show particular promise for languages like Seneca that are both endangered and lack extensive documentation.
more » « less
Full Text Available
YOLOrs: Object Detection in Multimodal Remote Sensing Imagery

https://doi.org/10.1109/JSTARS.2020.3041316

Sharma, Manish; Dhanaraj, Mayur; Karnam, Srivallabha; Chachlakis, Dimitris G.; Ptucha, Raymond; Markopoulos, Panos P.; Saber, Eli (January 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing)
null (Ed.)
Full Text Available
Capturing Laughter and Smiles under Genuine Amusement vs. Negative Emotion

https://doi.org/10.1109/PerComWorkshops48775.2020.9156102

Forman, Cleo; Thiel, Pablo; Ptucha, Raymond; Dominguez, Miguel; Alm, Cecilia O. (March 2020, 2020 Workshop on Human-Centered Computational Sensing - IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops))

Smiling and laughter are typically associated with amusement. If they occur under negative emotions, systems responding naively may confuse an uncomfortable smile or laugh with an amused state. We present a passive text and video elicitation task and collect spontaneous laughter and smiles in reaction to amusing and negative experiences, using standard, ubiquitous sensors (webcam and microphone), along with participant self-ratings. While we rely on a state-of-the-art smile recognizer, for laughter recognition our transfer learning architecture enhanced on modest data outperforms other models with up to 85% accuracy (F1 = 0.86), suggesting this technique as promising for improving affect models. Subsequently, we analyze and automatically predict laughter as amused vs. negative. However, contrasting with prior findings for acted data, for this spontaneously elicited dataset classifying laughter by emotional valence is not satisfactory.
more » « less
Full Text Available
Synthetic Data Augmentation for Improving Low-Resource ASR

https://doi.org/10.1109/WNYIPW.2019.8923082

Thai, Bao; Jimerson, Robert; Arcoraci, Dominic; Prud'hommeaux, Emily; Ptucha, Raymond (October 2019, 2019 IEEE Western New York Image and Signal Processing Workshop (WNYISPW))

Although the application of deep learning to automatic speech recognition (ASR) has resulted in dramatic reductions in word error rate for languages with abundant training data, ASR for languages with few resources has yet to benefit from deep learning to the same extent. In this paper, we investigate various methods of acoustic modeling and data augmentation with the goal of improving the accuracy of a deep learning ASR framework for a low-resource language with a high baseline word error rate. We compare several methods of generating synthetic acoustic training data via voice transformation and signal distortion, and we explore several strategies for integrating this data into the acoustic training pipeline. We evaluate our methods on an indigenous language of North America with minimal training resources. We show that training initially via transfer learning from an existing high-resource language acoustic model, refining weights using a heavily concentrated synthetic dataset, and finally fine-tuning to the target language using limited synthetic data reduces WER by 15% over just transfer learning using deep recurrent methods. Further, we show improvements over traditional frameworks by 19% using a similar multistage training with deep convolutional approaches.
more » « less
Full Text Available
Multimodal Anticipated versus Actual Perceptual Reactions

https://doi.org/10.1145/3351529.3360663

Saraf, Monali; Roberts, Tyrell; Ptucha, Raymond; Homan, Christopher; Alm, Cecilia Ovesdotter (January 2019, Adjunct of the 2019 International Conference on Multimodal Interaction)

Full Text Available
Sensing and Learning Human Annotators Engaged in Narrative Sensemaking

https://doi.org/10.18653/v1/N18-4019

Tornblad, McKenna; Lapresi, Luke; Homan, Christopher; Ptucha, Raymond; Ovesdotter Alm, Cecilia (June 2018, North American Chapter of the Association for Computational Linguistics - Human Language Technology: Student Research Workshop)

While labor issues and quality assurance in crowdwork are increasingly studied, how annotators make sense of texts and how they are personally impacted by doing so are not. We study these questions via a narrative-sorting annotation task, where carefully selected (by sequentiality, topic, emotional content, and length) collections of tweets serve as examples of everyday storytelling. As readers process these narratives, we measure their facial expressions, galvanic skin response, and self-reported reactions. From the perspective of annotator well-being, a reassuring outcome was that the sorting task did not cause a measurable stress response, however readers reacted to humor. In terms of sensemaking, readers were more confident when sorting sequential, target-topical, and highly emotional tweets. As crowdsourcing becomes more common, this research sheds light onto the perceptive capabilities and emotional impact of human readers.
more » « less
Full Text Available
Understanding the Semantics of Narratives of Interpersonal Violence through Reader Annotations and Physiological Reactions

Calderwood, Alexander; Pruett, Elizabeth; Ptucha, Raymond; Homan, Christopher; Alm, Cecilia O (January 2017, Proceedings of the Workshop Computational Semantics Beyond Events and Roles)

Interpersonal violence (IPV) is a prominent sociological problem that affects people of all demographic backgrounds. By analyzing how readers interpret, perceive, and react to experiences narrated in social media posts, we explore an understudied source for discourse about abuse. We asked readers to annotate Reddit posts about relationships with vs. without IPV for stakeholder roles and emotion, while measuring their galvanic skin response (GSR), pulse, and facial expression. We map annotations to coreference resolution output to obtain a labeled coreference chain for stakeholders in texts, and apply automated semantic role labeling for analyzing IPV discourse. Findings provide insights into how readers process roles and emotion in narratives. For example, abusers tend to be linked with violent actions and certain affect states. We train classifiers to predict stakeholder categories of coreference chains. We also find that subjects' GSR noticeably changed for IPV texts, suggesting that co-collected measurement-based data about annotators can be used to support text annotation.
more » « less
Full Text Available

Search for: All records